PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources
نویسندگان
چکیده
The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.
منابع مشابه
PHENOstruct: Prediction of human phenotype ontology terms
The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict...
متن کاملCombFunc: predicting protein function using heterogeneous data sources
Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which mak...
متن کاملCandidate gene prioritization with Endeavour
Genomic studies and high-throughput experiments often produce large lists of candidate genes among which only a small fraction are truly relevant to the disease, phenotype or biological process of interest. Gene prioritization tackles this problem by ranking candidate genes by profiling candidates across multiple genomic data sources and integrating this heterogeneous information into a global ...
متن کاملSemantic search among heterogeneous biological databases based on gene ontology.
Semantic search is a key issue in integration of heterogeneous biological databases. In this paper, we present a methodology for implementing semantic search in BioDW, an integrated biological data warehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entries from BioDW data sources with GO, and the semantic similarity table to record similarity scores d...
متن کاملOntologies improve cross-species phenotype analysis
As phenotype data analysis has become an important component of functional genomics, many methods for analyzing these data have been published in the recent past. For example, RNA interference (RNAi) in mice has significantly improved our understanding of gene regulation, even for human disease. However, as phenotypes are obtained through species-specific experiments, they are usually described...
متن کامل